A Sampling-based Tool for Plagiarism Detection in Student Texts

نویسندگان

  • Tuomo Kakkonen
  • Niko Myller
چکیده

This paper introduces AntiPlag, an advanced plagiarism detection tool intended for use on student texts. It is capable of both hermetic detection that scrutinizes only local collections of documents (other students’ texts and lecture materials, for example) and web plagiarism detection, in which the aim is at identifying instances of plagiarism that have been sourced from the Internet. The main feature of the system is the sampling-based web plagiarism detection, a novel approach to plagiarism detection that is based on combining web and hermetic search technologies. The system uses standard web search engines to locate documents on the Internet that might have been used as sources of plagiarism by the writer of a text. During this sampling phase, the suspected sources are downloaded, converted to ASCII text and saved to the local database so that they can be later processed by using the hermetic detection methods. We evaluated the system by using a test set that contained instances of verbatim copying as well as texts in which plagiarism was concealed by minor editing, replacing words with synonyms and by paraphrasing. We compared the results achieved by AntiPlag to an earlier evaluation study of four web plagiarism detection systems, SafeAssignment, TurnitIn, EVE2 and Plagiarism-Finder. AntiPlag performed better than any of these systems, achieving the accuracy 95.8% over all the test items.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

Running head: Automatic student plagiarism detection: future perspectives AUTOMATIC STUDENT PLAGIARISM DETECTION: FUTURE PERSPECTIVES

The availability and use of computers in teaching has seen an increase in the rate of plagiarism among students because of the wide availability of electronic texts online. While computer tools that have appeared in the recent years are capable of detecting simple forms of plagiarism, such as copy-paste, a number of recent research studies devoted to evaluation and comparison of plagiarism dete...

متن کامل

A Review of Electronic Services for Plagiarism Detection in Student Submissions

Student plagiarism is an ever-increasing problem for academic institutions. A growing number of students are using material from the Web in their submissions, without properly acknowledging the source. This paper reviews the need for widespread plagiarism detection systems and evaluates available Web based detection services. Four services are discussed: the Measure of Software Similarity (MOSS...

متن کامل

An Evaluation of Web Plagiarism Detection Systems for Student Essays

This study uses purpose-built test data and empirical experiments to report on the performance of four web plagiarism detection systems: TurnitIn, SafeAssignment, Plagiarism-Finder and EVE. In addition to measuring accuracy of detection, we evaluated the extent to which these systems produce false detections. We obtained the test data from multiple sources and edited it in several ways to conce...

متن کامل

Detection of Plagiarism in Student Essays

This paper presents two methods for automatic detection of plagiarism in student essays, using Dutch text corpora to show their effectiveness. The first method is based on measuring the overlap in word trigrams between two essays, excluding all trigrams from the assignment text. This method proves efficient and robust, but relies on the availability of the plagiarized source. The second method ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1206.6606  شماره 

صفحات  -

تاریخ انتشار 2009